De Do Do Do, De Di i Di - Panel Data Methods

Data Analytics for Finance

Caspar David Peter

Rotterdam School of Management, Accounting Department

Panel Data Methods

Overview

Overview

Today’s learning objectives

  • What are good and bad controls in regression analysis?
  • Understand the Difference-in-Differences (DiD) framework for causal inference using panel data
  • Learn how to implement DiD design, including fixed effects
  • Recognize the importance of appropriate standard error adjustments in DiD analyses
  • Explore advanced DiD topics such as staggered adoption and recent methodological developments

Overview

Recap of previous lecture

Overview

Today’s learning objectives

  • What are good and bad controls in regression analysis?
  • Understand the Difference-in-Differences (DiD) framework for causal inference using panel data
  • Learn how to implement DiD design, including fixed effects
  • Recognize the importance of appropriate standard error adjustments in DiD analyses
  • Explore advanced DiD topics such as staggered adoption and recent methodological developments

What to do if random assignment is not possible?

We did not answer this question in the previous lecture!

What to do if random assignment is not possible?

Building the OLS model

OLS model building blocks

Control Variables in Regression Analysis

Control Variables in Regression Analysis

Good vs. Bad controls

Good controls

  • Variables that are correlated with the outcome but not with the treatment
  • Help reduce omitted variable bias
  • Improve precision of estimates

Bad controls

  • Variables that are affected by the treatment (mediators)
  • Variables that are correlated with both treatment and outcome (colliders)
  • Can introduce bias into estimates

Control Variables in Regression Analysis

Good controls - Examples

Confounder - “Common cause”

“Hidden common cause”

DiLLMA example: controlling for study hours and attendance rate

Control Variables in Regression Analysis

Bad controls - Examples

Mediator

  • Variables that lie on the causal path between treatment and outcome
  • Controlling for mediator can block part of the treatment effect, leading to biased estimates

Collider

  • Variables that are influenced by both treatment and outcome
  • Controlling for collider can open a backdoor path, introducing spurious associations and bias

Control Variables in Regression Analysis

Bad controls - Example

Mediator

  • Variables that lie on the causal path between treatment and outcome
  • Controlling for mediator can block part of the treatment effect, leading to biased estimates

Simple Mediator Example

  • Bias arises from controlling fro a variable that lies on the causal path X → C → Y
  • If we include reported earnings as a control in a regression of market return on accounting choices, we remove part of the effect of accounting on returns.

#TODO: Add more examples

Difference-in-Differences (DiD)

Difference-in-Differences (DiD)

Key concepts

What is Difference‑in‑Differences (DID)?

Difference‑in‑Differences (DID) is a quasi‑experimental method that exploits within‑group variation over time and cross‑group variation to identify a causal effect when random assignment is infeasible.

Difference-in-Differences (DiD)

Why Differnces - Isn’t one difference enough?

Before vs After (time variation)

What is it?

Compare outcomes before and after treatment implementation, e.g. pre- and post-policy change

Why not enough for causal inference?

All variation in Treatment is explained by Time!

Treated vs Control (group variation)

What is it?

Compare outcomes between treated and control groups, e.g. those affected by a policy change vs those not affected

Why not enough for causal inference?

Differences between Treated and Control groups may be driven by time-invariant confounders, e.g. ability, demographics, location, etc.

Combining both, allows to isolating the causal impact of the treatment a.k.a. average treatment effect on the treated (ATT)

Difference-in-Differences (DiD)

Why Differnces - Isn’t one difference enough?

A picture is worth a thousand words

DiD only recovers the causal effect if the “parallel trends assumption” holds!

Difference-in-Differences (DiD)

DiLLMa - Setting continued

  • We observe students’ exam scores before and after LLMs were introduced 1
  • Some courses allowed LLM use (treatment), others banned it (control)
  • Goal: Estimate the causal effect of allowing LLM use on exam scores

In short

Compare grade changes in allowed vs. banned courses, before and after LLMs became available

DiD isolates the treated group’s response, conditional on the assumption that the untreated group’s changes represent the non-treatment counterfactual for the treated group

Difference-in-Differences (DiD)

Canonical DiD model - The two-by-two design

The two-by-two set-up

(1) After (2) Before (1) - (2)
(a) Treatment Y\(_{treated,\ after}\) Y\(_{treated,\ before}\) \(\Delta_{treated}\)
(b) Control Y\(_{control,\ after}\) Y\(_{control,\ before}\) \(\Delta_{control}\)
(a) - (b) \(\Delta_{after}\) \(\Delta_{before}\) DiD

A typical DiD regression looks like this

\[Y = \beta_0 + \beta_1 Treated + \beta_2 After + \beta_3 Treated \times After + \epsilon\]

  • The difference-in-differences regression gives you the same estimate as if you took differences in the group averages

  • It takes also care of any unobserved constant differences between subjects and time trends!

Difference-in-Differences (DiD)

Canonical DiD model - The two-by-two design

The two-by-two set-up

(1) After (2) Before (1) - (2)
(a) Treatment \(\beta_0 + \beta_1+\beta_2+\beta_3\) \(\beta_0 + \beta_1\) \(\beta_2+\beta_3\)
(b) Control \(\beta_0 + \beta_2\) \(\beta_0\) \(\beta_2\)
(a) - (b) \(\beta_1+\beta_3\) \(\beta_1\) \(\beta_3\)

A typical DiD regression looks like this

\[Y = \beta_0 + \beta_1 Treated + \beta_2 After + \beta_3 Treated \times After + \epsilon\]

  • The difference-in-differences regression gives you the same estimate as if you took differences in the group averages

  • It takes also care of any unobserved constant differences between subjects and time trends!

Difference-in-Differences (DiD)

Canonical DiD model - The two-by-two design

Example data summary

  • Same variables as in previous lecture, but now panel data with multiple observations per student
  • Treatment assigned at the course level (some instructors allow LLM use, others ban it)
  • Data contains two periods: pre-LLM (before) and post-LLM (after), each consisting of two exam scores per student
  • The unit of analysis is the student-time level
  • The key variables are:
    • treated: Indicator for whether the course allows LLM use (1 = yes, 0 = no)
    • after: Indicator for whether the observation is from the post-LLM period (1 = after, 0 = before)
    • exam_score: The student’s exam score

Difference-in-Differences (DiD)

Canonical DiD model - The two-by-two design

Comparing means

After Before Difference
Treated group 6.57 7.12 -0.55
Control group 7.42 7.22 0.20
Difference -0.85 -0.10 -0.75

Estimation

Difference-in-Differences (DiD) .{smaller}

2-way fixed effects model

\[Y = \alpha_i + \alpha_t + \beta_3 Treated \times After + \epsilon\]

What is absorbed by fixed effects?
  • \(\beta_0\) disappears - not meaningful in panel data with fixed effects
  • \(\beta_1\) disappears - time-invariant differences are absorbed by individual fixed effects
  • \(\beta_2\) disappears - time fixed effects capture common shocks over time
Why use fixed effects?
  • Controls for unobserved heterogeneity across individuals and over time
  • Focuses on within-individual variation to identify treatment effects
  • More robust to omitted variable bias from time-invariant confounders

Difference-in-Differences (DiD)

2-way fixed effects model

Estimation with fixed effects

What changes with fixed effects?

  • The DiD estimate remains similar, indicating robustness to fixed effects
  • Controls for unobserved individual heterogeneity and time effects
  • Time-invariant confounders are addressed (ability drops out!)
  • But what about standard errors?

Difference-in-Differences (DiD)

Identification

Unconditional means

Event study approach

The parallel trends assumption states that, in the absence of treatment, the average change in the outcome variable would have been the same for both the treatment and control groups.

Difference-in-Differences (DiD)

Standard errors in DiD

Difference-in-Differences (DiD)

Takeaways so far

  • DiD leverages both cross-sectional and time-series variation to identify causal effects (ATT)
  • The canonical DiD model can be extended with fixed effects to control for time-invariant heterogeneity
  • The parallel trends assumption is crucial for valid DiD inference
  • Proper standard error adjustments are essential for accurate inference in DiD settings
  • Is that all? Not quite…

Limitations

Limitations arise in rollout (staggered) designs, where treatment timing varies across groups; TWFE can perform poorly in such settings (to be discussed later).

Difference-in-Differences (DiD)

Standard errors in DiD - Issues

Difference-in-Differences (DiD)

Standard errors in DiD - A new hope

Difference-in-Differences (DiD)

Beyond your MSc repliciation - Your contribution

New approaches

How to learn about the new approaches?

(baker2022difference?)

How to learn about applicating them?

A way to extend your MSc replication project is to apply some of the new DiD methods to the DiLLMa dataset and compare the results with the traditional TWFE approach.

Thank You for Your Attention!

See You in the Next One!

References

Breuer, Matthias, and ED DeHaan. 2024. “Using and Interpreting Fixed Effects Models.” Journal of Accounting Research 62 (4): 1183–1226.
Cinelli, Carlos, Andrew Forney, and Judea Pearl. 2024. “A Crash Course in Good and Bad Controls.” Sociological Methods & Research 53 (3): 1071–1104.
Huntington-Klein, Nick. 2022. The Effect: An Introduction to Research Design and Causality. 2nd ed. Chapman; Hall/CRC.
Verbeek, Marno. 2021. Panel Methods for Finance: A Guide to Panel Data Econometrics for Financial Applications. De Gruyter.